Stochastic models for document restructuration
نویسندگان
چکیده
Document (re)structuration consists in mapping documents coming from different sources, with different formats, onto a predefined semistructured format. This generic problem appears in different applications settings like heterogeneous semi-structured databases querying, peer to peer systems, legacy document conversion, XML information retrieval. In the paper, we define the restructuration problem from a document centric perspective and identify the main problems raised by this new problematic. We then consider two restructuration instances: structuring flat documents and learning the correspondence between structured formats. We propose stochastic models for these two tasks and describe tests on a large XML document collection.
منابع مشابه
Restructuration automatique de documents dans les corpus semi-structurés hétérogènes
Résumé. L’interrogation de grandes bases de documents semi-structurés (type XML) est un problème ouvert important. En effet, pour interroger un document dont le schéma est nouveau, un système doit pouvoir soit adapter la requête posée au document, soit adapter le document pour pouvoir lui appliquer la requête. Nous nous positionnons ici dans le cadre de la restructuration de documents qui consi...
متن کاملModèle probabiliste pour l'extraction de structures dans les documents semistructurés - Application aux documents Web
With content management system becoming mainstream the Web has changed dramatically: more and more web pages are now generated from relational databases and their design reflects the logical structure of documents. In this work, we show that there is enough information in the layout of a web document to capture the kind of data people are already producing in a more machine-friendly format. The...
متن کاملMéthode de formation et de restructuration dynamique de coalitions d'agents fondée sur l'optimum de Pareto
This paper presents a coalition formation protocol for multi-agent systems, which find a Pareto optimal solution without any agent’s preferences aggregation. We present an extension of this protocol allowing dynamic restructuration for coalitions. We present behavior’s model for agents, which are well adapted for our coalition formation protocol. An application based on teaching scheduling has ...
متن کاملBehavioral study of piston manufacturing plant through stochastic models
Piston plays a vital role in almost all types of vehicles. The present study discusses the behavioral study of a piston manufacturing plant. Manufacturing plants are complex repairable systems and therefore, it is difficult to evaluate the performance of a piston manufacturing plant using stochastic models. The stochastic model is an efficient performance evaluator for repairable systems. In...
متن کاملApplication of Stochastic Optimal Control, Game Theory and Information Fusion for Cyber Defense Modelling
The present paper addresses an effective cyber defense model by applying information fusion based game theoretical approaches. In the present paper, we are trying to improve previous models by applying stochastic optimal control and robust optimization techniques. Jump processes are applied to model different and complex situations in cyber games. Applying jump processes we propose some m...
متن کامل